Consensus Training for Consensus Decoding in Machine Translation

نویسندگان

  • Adam Pauls
  • John DeNero
  • Dan Klein
چکیده

We propose a novel objective function for discriminatively tuning log-linear machine translation models. Our objective explicitly optimizes the BLEU score of expected n-gram counts, the same quantities that arise in forestbased consensus and minimum Bayes risk decoding methods. Our continuous objective can be optimized using simple gradient ascent. However, computing critical quantities in the gradient necessitates a novel dynamic program, which we also present here. Assuming BLEU as an evaluation measure, our objective function has two principle advantages over standard max BLEU tuning. First, it specifically optimizes model weights for downstream consensus decoding procedures. An unexpected second benefit is that it reduces overfitting, which can improve test set BLEU scores when using standard Viterbi decoding.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Consensus Hypothesis Regeneration for Machine Translation

This paper presents a fast consensus hypothesis regeneration approach for machine translation. It combines the advantages of feature-based fast consensus decoding and hypothesis regeneration. Our approach is more efficient than previous work on hypothesis regeneration, and it explores a wider search space than consensus decoding, resulting in improved performance. Experimental results show cons...

متن کامل

Model Combination for Machine Translation

Machine translation benefits from two types of decoding techniques: consensus decoding over multiple hypotheses under a single model and system combination over hypotheses from different models. We present model combination, a method that integrates consensus decoding and system combination into a unified, forest-based technique. Our approach makes few assumptions about the underlying component...

متن کامل

The RWTH Aachen System for NTCIR-10 PatentMT

This paper describes the statistical machine translation (SMT) systems developed by RWTH Aachen University for the Patent Translation task of the 10th NTCIR Workshop. Both phrase-based and hierarchical SMT systems were trained for the Japanese-English and Chinese-English tasks. Experiments were conducted to compare standard and inverse direction decoding, the performance of several additional m...

متن کامل

Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders

This paper presents collaborative decoding (co-decoding), a new method to improve machine translation accuracy by leveraging translation consensus between multiple machine translation decoders. Different from system combination and MBR decoding, which postprocess the n-best lists or word lattice of machine translation decoders, in our method multiple machine translation decoders collaborate by ...

متن کامل

Learning Translation Consensus with Structured Label Propagation

In this paper, we address the issue for learning better translation consensus in machine translation (MT) research, and explore the search of translation consensus from similar, rather than the same, source sentences or their spans. Unlike previous work on this topic, we formulate the problem as structured labeling over a much smaller graph, and we propose a novel structured label propagation f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009